-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp/orderbook: Represent assets in orderbook graph as int32 instead of strings #4102
Conversation
exp/orderbook/graph.go
Outdated
if len(graph.vacantIDs) > 0 { | ||
id = graph.vacantIDs[len(graph.vacantIDs)-1] | ||
graph.vacantIDs = graph.vacantIDs[:len(graph.vacantIDs)-1] | ||
graph.idToAssetString[id] = assetString | ||
} else { | ||
id = int32(len(graph.idToAssetString)) | ||
graph.idToAssetString = append(graph.idToAssetString, assetString) | ||
graph.venuesForBuyingAsset = append(graph.venuesForBuyingAsset, nil) | ||
graph.venuesForSellingAsset = append(graph.venuesForSellingAsset, nil) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this requires further clarification. Particularly as to why there are multiple ways to obtain the ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do we really need the vacantIDs
mechanism? It seems to complicate things and it's not obvious (to me at least) what gain it brings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the purpose of vacantIDs
is to make sure the assets array does not waste any cells.
let's say we start out with an empty graph . the assets array will be empty in that case.
then we add the following assets:
0 -> 'usd'
1 -> 'eur'
2 -> 'chf'
3 -> 'sek'
now, we remove all offers and pools which have the chf asset so we can remove it. at this point the array looks like:
0 -> 'usd'
1 -> 'eur'
2 -> ''
3 -> 'sek'
the cell at index 2 is vacant and we can reuse it the next time we add a new asset, for example 'yen'
0 -> 'usd'
1 -> 'eur'
2 -> 'yen'
3 -> 'sek'
without the vacantIDs mechanism we will either have to add 'yen' to index 4 and forever let cell 2 to be empty, or we could try to reshuffle the mapping so there are no empty cells when remove 'chf' but that would be an expensive operation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Does wasting cells have a big performance impact? I presume we would be consuming a bit more memory, but maybe that's acceptable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a small performance impact because we iterate over all the cells in the search algorithm here:
Line 141 in 41b735e
for currentAsset := int32(0); currentAsset < totalAssets; currentAsset++ { |
For the empty cells we skip them through this check:
currentAmount := bestAmount[currentAsset]
if currentAmount == 0 {
continue
}
I think a few empty cells wouldn't matter but if we didn't eventually fill in the empty cells I think the performance would slowly get worse over time as we accrue more and more empty cells until horizon restarts
// we assign id to asset | ||
graph.idToAssetString = append(graph.idToAssetString, assetString) | ||
graph.venuesForBuyingAsset = append(graph.venuesForBuyingAsset, nil) | ||
graph.venuesForSellingAsset = append(graph.venuesForSellingAsset, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we clear venuesForBuyingAsset
and venuesForSellingAsset
when assigning to a vacant id? It seems we don't do that in maybeDeleteAsset
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in order to get included in the vacant id list it is a necessary condition that graph.venuesForBuyingAsset[asset] and graph.venuesForSellingAsset[asset] are empty:
func (graph *OrderBookGraph) maybeDeleteAsset(asset int32) {
buyingEdgesEmpty := len(graph.venuesForBuyingAsset[asset]) == 0
sellingEdgesEmpty := len(graph.venuesForSellingAsset[asset]) == 0
if buyingEdgesEmpty && sellingEdgesEmpty {
delete(graph.assetStringToID, graph.idToAssetString[asset])
// When removing an asset we do not resize the idToAssetString array.
// Instead, we allow the cell occupied by the id to be empty.
// The next time we will add an asset to the graph we will allocate the
// id to the new asset.
graph.idToAssetString[asset] = ""
graph.vacantIDs = append(graph.vacantIDs, asset)
}
}
return id | ||
} | ||
// before creating a new int32 asset id we will try to use | ||
// a vacant id so that we can plug any empty cells in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great design, just curious, would storing nil in idToAssetString equate to same result as maintaining separate vacancy state, i.e., iterate for idToAssetString=nil
instead, perhaps for less code, but just wondering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's true we could avoid having a vacantIDs
list entirely if we scan through idToAssetString
to find the first empty cell. in the worst case if there are no empty cells we have to scan through the entire array before realizing we have to append to the end. Having vacantIDs
makes the operation of adding a new asset faster
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great insight and improvement.
…f strings (#4102) Represent assets in orderbook graph as int32 instead of strings
…f strings (stellar#4102) Represent assets in orderbook graph as int32 instead of strings
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Previously we were representing the adjacency list for the graph as a mapping from asset string to the list of offers / pools which buy / sell that asset.
Now, every asset is assigned an integer id and the adjacency list is represented as an array where the integer id is used to index into the array. This new data structure is much more compact because the asset strings were very lengthy. The new data structure is also much faster because array indexing is significantly faster than looking up keys in a map.
Why
New benchmark:
Old benchmark
The new code reduced the latency from 74.5 ms per call to 12 ms per call. Also, it reduced the space from 17.8 mb per call to 2.8mb per call.
Known limitations
[N/A]